Finding Fish in Ocean Sounds

A Data-Driven Approach to Marine Acoustic Monitoring

Author

Marine Biodiversity Observation Network (MBON)

Published

September 24, 2025

The Problem: Too Much Data, Too Little Time

Marine biologists use underwater microphones (hydrophones) to monitor fish populations. These devices record ocean sounds 24/7, capturing fish calls, dolphin clicks and whistles, as well as passing ships. The challenge: manually listening through thousands of hours of recordings to find fish calls is extremely time-consuming.

This bottleneck limits how much we can learn about ocean ecosystems. We needed a better way to identify when and where to focus our limited listening time.

Building on Previous Work

Transue et al. (2023) showed that environmental factors like water temperature strongly predict fish vocal activity in Charleston Harbor. Their work used traditional machine learning (Random Forest) to identify environmental patterns.

Our contribution: We asked whether the computed acoustic characteristics could tell us about biological activity beyond what environmental data provides. We combined environmental monitoring with acoustic indices - mathematical measures that capture different aspects of underwater sound complexity.

Our Data: A year of acoustic recordings

We analyzed a year of acoustic monitoring data from three underwater listening stations:

  • 13,100 time periods (2-hour chunks) during 2021
  • 10 acoustic indices measuring sound complexity, diversity, and intensity (reduced from an initial set of 60+ indices through correlation analysis and feature selection to avoid redundancy)
  • Environmental data including temperature, depth, and ambient sound levels
  • Expert-verified detections of six marine species:
    • Silver perch (grunts and calls)
    • Oyster toadfish (distinctive “boat whistle” sound)
    • Spotted seatrout (drumming sounds)
    • Atlantic croaker (croaking calls)
    • Red drum (drumming sounds)
    • Vessel noise (for comparison)

Data Alignment Challenge

Different data streams operated on different schedules: - Acoustic recordings: Continuous, analyzed in 2-hour blocks - Fish detections: Irregular timing based on biological activity - Environmental sensors: Various sampling intervals (15 minutes to 1 hour)

We developed a temporal alignment system that matched all data sources to consistent 2-hour time windows.

Feature Engineering: Teaching Computers What to Listen For

Marine biologists don’t just listen for individual fish calls - they listen for patterns. By computing indices, we essentially translated this expert knowledge into mathematical features:

Acoustic Indices: - Background noise levels: How quiet or noisy is the ocean? - Sound diversity: How many different sound sources are present? - Frequency patterns: Are sounds concentrated in low frequencies (large fish) or high frequencies (small fish)? - Acoustic complexity: Simple steady sounds vs. complex changing patterns

Environmental Context: - Water temperature and recent changes - Water depth which captures mainly tidal fluctuations - Seasonal timing: Day of year and time of day

The Machine Learning Journey: Systematic Attempts and Instructive Failures

Phase 1: Species-Specific Prediction (Notebook 4)

Hypothesis: Acoustic indices can predict individual species calling patterns

Approach: Direct prediction of species activity levels (0-3 scale) using acoustic indices

Result: sorry but: Complete failure - models couldn’t reliably predict when individual species would call

Lesson: Individual species patterns are too complex and irregular for generic acoustic indices to capture

Phase 2: Vessel Analysis and Signal Cleaning (Notebook 5)

Hypothesis: Removing vessel noise will reveal clearer biological patterns
Approach:

  • Train models to detect vessel presence using acoustic indices
  • Compare fish-acoustic correlations with/without vessel periods

Results:

  • Vessel detection: 85.4% accuracy (Logistic Regression) - this actually worked
  • Signal improvement: Only 8.1% improvement in biological correlations when vessels removed
  • Vessel presence: 20.7% of monitoring periods had vessel activity

Lesson: Vessel noise wasn’t the primary limiting factor for biological signal detection which is weird because

Phase 3: Community-Level Detection (Notebooks 6-6.04)

Hypothesis: Community-level patterns are more detectable than species-specific ones
Approach: Multiple systematic iterations:

  • Basic community activity metrics (total activity, # active species)
  • Enhanced feature engineering with environmental variables
  • Systematic feature selection comparison (Mutual Information vs Boruta)
  • Temporal modeling with lag features and proper time series validation

Results: Mixed success with critical problems revealed

Best Case Performance (Standard Cross-Validation):

  • “Any biological activity” detection: F1 = 0.84 (Random Forest)
  • “High activity” detection: F1 = 0.71 (Decision Tree)
  • Multi-species activity: F1 = 0.58 (Decision Tree)
  • Screening potential: 60-80% effort reduction while catching 68-89% of biological activity

Critical Problems Discovered:

1. Feature Selection Instability Mutual Information and Boruta (the method used by Transue et al.) selected almost completely different features:

  • Agreement rate: Only 10-30% overlap in top features across targets
  • “Any activity” target: 0% agreement between methods
  • “High activity” targets: Only 20% agreement
  • Single consensus feature: ACTspFract appeared in both methods for some targets

This disagreement suggested either:

  • The biological signal was too weak or unstable
  • Different methods were capturing noise rather than robust signal
  • Complex biological relationships that no single method could capture reliably

2. Temporal Validation Crisis When proper time-series validation replaced random train/test splits, performance collapsed:

  • Logistic Regression: F1 dropped from 0.825 to 0.633 (23% performance loss)
  • Random Forest: F1 dropped from 0.843 to 0.788 (6.5% performance loss)
  • Temporal leakage detected: Models were “cheating” by using future information to predict past events
  • Real-world performance: Would be much lower than initially reported

3. Temperature Dominance Problem

  • Temperature consistently ranked as the top predictor across all models
  • Environmental data alone might be sufficient for most predictions
  • Acoustic indices provided minimal additional value over temperature trends
  • Question emerged: “Do we need acoustic monitoring if temperature predicts biological activity?”

4. Limited Generalization

  • Models didn’t transfer reliably across monitoring stations
  • Seasonal patterns appeared station-specific
  • No evidence of robust, transferable acoustic-biological relationships

The Fundamental Challenge

After systematic testing across multiple approaches, algorithms, and validation methods, traditional machine learning revealed more problems than solutions:

  • No reliable feature set - different selection methods disagreed completely
  • Temporal structure matters - random validation severely overestimated real performance
  • Environmental variables dominate - acoustic indices added little value beyond temperature
  • Limited transferability - models were site and time-specific

Figure 2a: Species-Specific Prediction Failures Scatter plots showing failed attempts to predict individual species calling patterns using acoustic indices. Poor correlations (R < 0.1) demonstrate why this approach was abandoned. this seems like a terrible figure, what should we show instead (if anything).

Figure 2b: Feature Selection Disagreement Crisis Mutual Information and Boruta feature selection methods showed 0-30% agreement across biological targets, indicating unstable or weak biological signals in the data.

Figure 2c: Temporal Validation Performance Collapse When proper time-series validation replaced random cross-validation, performance dropped 6-23%, revealing that models were “cheating” by using future information to predict past events.

Figure 2d: ML Journey Summary Timeline showing the systematic progression of ML attempts from initial failures through the breakthrough that led to the pattern-based guidance system. (this seems rather dramatic: “CRISIS”! “REVELATION”!)

The critical realization: We were asking the wrong question.

The Breakthrough: From Prediction to Guidance

The Pattern Discovery

The breakthrough came when we started looking at our acoustic index heat maps alongside manual detection patterns. The visual similarity was striking - certain acoustic indices seemed to “light up” in the same temporal patterns as fish detections. This suggested there were discoverable patterns hiding in the data.

Changing the Question

Seeing these patterns prompted us to reframe our approach. Instead of “Will fish be calling tomorrow?” we asked “Given that it’s May 15th at 6 AM, how likely is it that fish are calling right now, and should we focus our listening time here?”

This led us to systematically visualize our data as 2D probability surfaces mapping fish activity likelihood across:

  • Day of year (seasonal patterns)
  • Time of day (daily rhythms)

The Hidden Patterns Revealed

When we plotted fish detection data this way, remarkably clear patterns emerged:

  • Silver perch: Sharp seasonal peaks with consistent daily patterns
  • Oyster toadfish: Tight spring spawning windows with dawn/dusk activity
  • Spotted seatrout: Summer-focused activity with midday peaks

The following heatmaps show how visually similar some indices are to the seasonal timing of fish calls.

Figure 4: Acoustic Index vs Manual Detection Comparison Side-by-side heat maps showing manual detection patterns and acoustic indices. The visual similarity between these patterns was our first hint that there were discoverable temporal patterns in the data.

Figure 5: Species-Specific 2D Probability Surfaces Heat maps for each species showing probability of detection across day-of-year vs time-of-day. This systematic approach revealed the clear seasonal and daily patterns that became the foundation of our guidance system.

Feature Analysis: What Matters Most

Correlation Analysis

We used mutual information to measure how much each acoustic index correlates with fish calling behavior.

Top Predictive Acoustic Indices:

  • BGN (Background Noise): Quieter periods often coincide with fish calls
  • NDSI (Normalized Difference Soundscape Index): Balance between biological and human sounds
  • Shannon Diversity: More diverse soundscapes often contain fish calls
  • LEQ (Sound Level): Specific sound intensity ranges associated with biological activity

Environmental vs. Acoustic Variables

Environmental Variables:

  • Water temperature: Mutual information = 0.17
  • Temperature changes over 2-6 hours: Strong secondary predictors

Acoustic Indices:

  • Best acoustic indices: Mutual information = 0.09-0.12
  • Multiple indices needed to match single temperature measurement
  • But they provide complementary information - not redundant. For example, temperature tells us about biological readiness to call, while acoustic indices tell us about current soundscape conditions that might facilitate or mask calling behavior. When we combined both types of features, model performance improved by 12-15% over using either alone.

Figure 6: Feature Importance Comparison Horizontal bar chart showing mutual information scores for environmental variables vs acoustic indices, demonstrating how they complement each other.

The Detection Guidance System

How It Works

  1. Pattern Learning: Analyze manual detection data to build species-specific 2D probability surfaces
  2. Environmental Enhancement: Adjust base probabilities using current conditions
  3. Priority Ranking: Rank all time periods by detection probability
  4. Guided Monitoring: Focus manual efforts on highest-ranked periods

Future enhancement idea: The system could potentially flag “unusual” periods - times when acoustic conditions suggest high biological activity but it’s outside typical seasonal patterns, possibly indicating environmental changes, migration events, or other anomalies worth investigating.

[FIGURE 7: Detection Guidance System Workflow] Suggested visualization: Flowchart showing the four-step process from historical pattern learning through guided monitoring, with example data/visualizations at each step.

Validation Methodology

Cross-Station Validation Approach

Given that our data spans only one year (2021), we couldn’t use traditional temporal train/test splits. Instead, we used cross-station validation to test model transferability across spatial locations:

The Three-Station System: - Station 37M: Mouth of river - Station 9M: Furthest up river - Station 14M: In between the other two

Validation Protocol: 1. Train on 2, test on 1: For each validation round, we trained our models on data from two stations and tested performance on the third station 2. All combinations tested: We ran three validation scenarios: - Train: Stations 37M + 9M → Test: Station 14M - Train: Stations 37M + 14M → Test: Station 9M - Train: Stations 9M + 14M → Test: Station 37M 3. Average performance: Final metrics represent the average across all three cross-station scenarios

This approach tests whether patterns learned at one location can predict biological activity at different locations - a critical requirement for real-world deployment of guidance systems.

Performance Results

Using our cross-station validation approach: Silver Perch: Best Performance

  • 86.6% of detections found by checking only top 20% of time periods
  • 80% reduction in manual effort
  • Consistent performance across stations

Oyster Toadfish: Strong Seasonal Patterns

  • 69.2% detection efficiency at 20% effort
  • AUC = 0.944: Nearly perfect discrimination
  • Correctly identified known spawning seasons

Spotted Seatrout: Good Generalization

  • 67.9% detection efficiency
  • AUC = 0.887: Strong cross-station performance
  • Summer patterns consistent across locations

Figure 8: Cross-Station Validation Results Performance metrics showing detection efficiency and AUC scores for each species across cross-station validation scenarios.

Why Some Species Worked Better

High Performers (Silver perch, Oyster toadfish):

  • Strong seasonal patterns
  • Consistent daily rhythms
  • Sufficient detection events for pattern learning
  • Similar behavior across locations

Challenging Cases (Atlantic croaker, Red drum): - Irregular temporal patterns - Lower detection rates - High variability between stations

[FIGURE 9: Species Performance vs Behavioral Predictability] Suggested visualization: Scatter plot with detection efficiency on y-axis vs some measure of temporal regularity (e.g., seasonal consistency score) on x-axis, with species labeled. Helps explain why some species worked better than others.

Real-World Impact

Time Savings

Traditional Approach:

  • 3 months of monitoring = 2,160 two-hour periods to check
  • 5 minutes per period = 180 hours of expert time (i really don’t know - this is a guess)

Guided Approach:

  • Check only top 20% of periods = 432 periods
  • Same 5 minutes per period = 36 hours
  • Time savings: 144 hours (80% reduction) - again, just guessing on specific timing
  • Detection rate: Still capture 70-85% of biological activity

Figure 10: Time Savings Analysis Before/after comparison showing traditional vs guided monitoring approach, demonstrating 80% reduction in manual effort while maintaining high detection rates.

Conservation Applications

  • Spawning season monitoring: Identify critical reproductive periods
  • Climate change research: Track shifts in seasonal timing
  • Marine protected area management: Optimize monitoring schedules
  • Ecosystem health assessment: Use vocal activity as population indicator

Scaling Impact

Our system addresses a major bottleneck in marine conservation: the gap between data collection and actionable insights. By reducing manual workload by 80% while maintaining scientific accuracy, we make large-scale acoustic monitoring feasible.

What We Learned

Biological Insights

  • Temperature is key: Confirms and extends previous findings about temperature’s role in fish behavior
  • Species-specific patterns: Each species has distinct temporal rhythms reflecting their ecology
  • Seasonality as a feature: Embracing seasonal patterns rather than fighting them led to breakthroughs
  • Acoustic indices add value: While temperature dominates, acoustic measures provide complementary information

Technical Lessons

  • 2D pattern recognition: Moving from time series to probability surfaces revealed hidden patterns
  • Cross-station validation: Testing across locations gives realistic performance estimates
  • Biological feature engineering: Incorporating expert knowledge improves results
  • Guidance vs. prediction: Pattern recognition proved more practical than predictive modeling

Methodological Insights

  • Work with biology: Systems aligned with natural patterns are more robust
  • Validation strategy matters: How you test determines real-world performance
  • Simple can be better: Our approach used kernel density estimation rather than complex deep learning
  • Ask the right questions: The breakthrough came from reframing the problem, not better algorithms

Limitations and Future Work

Current Limitations

  • Single-year data: Based on 2021 only - need multi-year validation
  • Geographic scope: Limited to coastal environments
  • Species coverage: Works best for species with strong patterns and sufficient detections
  • Manual validation: Still requires expert verification

Next Steps

Near-term:

  • Multi-year validation as indices are computed for 2018 data
  • Compute LTSA/decidecade spectral info so we can check if that helps improve our automated process.

Long-term:

  • Regional network deployment
  • Climate change monitoring using pattern shifts
  • Integration with automated detection algorithms
  • Combination with other monitoring methods

Conclusions

This work shows that marine acoustic monitoring can transform from a manual bottleneck into a scalable research tool. By revealing temporal patterns in ocean soundscapes, we’ve created new possibilities for understanding marine ecosystem dynamics.

The 80% reduction in manual effort while maintaining high detection rates makes comprehensive acoustic monitoring feasible for exploring and understanding the specific biodiversity trends in a specific region. This efficiency gain enables ecosystem-wide monitoring, rapid assessment of conservation measures, and early detection of environmental changes.

As ocean monitoring networks expand, tools like this detection guidance system will become essential for marine conservation. The ocean generates acoustic data faster than we can analyze it, but with the right approach to guide our listening, we can finally process what the ocean has been telling us.


References

Transue, B., et al. (2023). Biological and anthropogenic soundscape of an urbanized port: Charleston Harbor. Marine Environmental Research.

[Additional references based on complete analysis pipeline]


This report demonstrates how data-driven approaches to temporal pattern recognition can transform marine ecosystem monitoring.